feat(q4_0): first-class Q4_0 core format + scalar kernel + SPI by michalharakal · Pull Request #648 · SKaiNET-developers/SKaiNET

michalharakal · 2026-05-30T17:39:37Z

First of a stacked series promoting Q4_0 (older GGML 4-bit, 18 bytes / 32 elements) from a JVM/MemSegment-only side-path to a first-class quantized format — mirroring how Q8_0 is wired — so any loader can produce it and any backend can specialize it.

Stacked on #647 (API-dump resync). Base will retarget to develop once #647 merges. The .api delta here is only the ~69 Q4_0 lines.

What's in this PR (Phase A, part 1)

commonMain heap type: Q4_0TensorData interface + Q4_0BlockTensorData (ByteArray-backed, PackedBlockStorage, toFloatArray()), plus TensorEncoding.Q4_0 (32 elems / 18 bytes).
Kernel SPI: Q4_0MatmulKernel interface + KernelProvider.matmulQ4_0() (default null) and a "Q4_0" case in supports().
Scalar kernel: ScalarQ4_0MatmulKernel (portable commonMain floor) wired via ScalarKernelProvider.
Dispatch: DefaultCpuOpsJvm lazy q4_0MatmulKernel (KernelRegistry) + is Q4_0TensorData -> branch in chooseQuantizedMatmul.

Layout correctness note

Uses the canonical ggml split nibble layout (low nibbles → elements 0..15, high → 16..31; (code - 8) * d) matching DequantOps.dequantQ4_0FromBytes — not the interleaved layout the existing JVM MemSeg dotQ4_0BlockMemSeg uses. That mismatch is the likely reason the Q4_0 MemSeg path was never exercised; PR2 reconciles the MemSeg kernel to this layout.

Tests

Q4_0TensorDataTest — pins split layout + (code-8)*scale dequant against the canonical ggml decode.
Q4_0MatmulDispatchTest — dispatch routes through the kernel and matches the scalar reference (single/multi-batch, dim×dim).
KernelProviderSupportsTest — extended for the Q4_0 capability query.
apiCheck green.

Follow-ups (stacked)

PR2 Panama SIMD + MemSeg reconcile · PR3 Native FFM · PR4 FP32→Q4_0 quantizer + loader policy · PR5 docs. Targeting 0.27.0.

🤖 Generated with Claude Code

Promotes Q4_0 (older GGML 4-bit, 18 bytes / 32 elements) from a JVM/MemSegment-only side-path to a first-class quantized format that any loader can produce and any backend can specialize, mirroring Q8_0: - commonMain `Q4_0TensorData` interface + `Q4_0BlockTensorData` (heap, ByteArray-backed) with `toFloatArray()` dequant and PackedBlockStorage. - `TensorEncoding.Q4_0` (32 elems / 18 bytes). - `Q4_0MatmulKernel` SPI + `KernelProvider.matmulQ4_0()` (default null) and a `"Q4_0"` case in `supports()`. - `ScalarQ4_0MatmulKernel` (portable commonMain floor) wired through `ScalarKernelProvider`. - `DefaultCpuOpsJvm`: lazy `q4_0MatmulKernel` resolved via KernelRegistry + an `is Q4_0TensorData ->` branch in `chooseQuantizedMatmul`. Uses the canonical ggml *split* nibble layout (low nibbles → elements 0..15, high → 16..31, `(code - 8) * d`) matching `DequantOps.dequantQ4_0FromBytes` — NOT the interleaved layout the existing JVM MemSeg `dotQ4_0BlockMemSeg` uses (that mismatch is the likely reason the Q4_0 MemSeg path was never exercised; PR2 reconciles it). Tests: Q4_0TensorDataTest (layout/dequant), Q4_0MatmulDispatchTest (scalar==dispatch), KernelProviderSupportsTest extended for Q4_0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

michalharakal mentioned this pull request May 30, 2026

feat(q4_0): Panama SIMD kernel + reconcile MemSeg to split layout #649

Merged

michalharakal merged commit 5ff5a36 into chore/resync-api-dumps May 30, 2026
5 checks passed

michalharakal deleted the feature/q4_0-core-format branch May 30, 2026 17:53

michalharakal mentioned this pull request May 30, 2026

docs(q4_0): changelog + quantized-kernels page for first-class Q4_0 #652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(q4_0): first-class Q4_0 core format + scalar kernel + SPI#648

feat(q4_0): first-class Q4_0 core format + scalar kernel + SPI#648
michalharakal merged 1 commit into
chore/resync-api-dumpsfrom
feature/q4_0-core-format

michalharakal commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 30, 2026

What's in this PR (Phase A, part 1)

Layout correctness note

Tests

Follow-ups (stacked)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant